Recognition Confidence Measures: Detection of Misrecognitions and Out-Of-Vocabulary Words

نویسنده

  • Sheryl R. Young
چکیده

This paper describes and evaluates a new technique for measuring confidence in word strings produced by speech recognition systems. It detects misrecognized and out-of-vocabulary words in spontaneous spoken dialogs. The system uses multiple, diverse knowledge sources including acoustics, semantics, pragmatics and discourse to determine if a word string is misrecognized. When likely misrecognitions are detected, a series of tests distinguishes out-of-vocabulary words from other error sources. The work is part of a larger effort to automatically recognize and understand new words when spoken in a spontaneous spoken dialog. We describe a system that combines newly developed acoustic confidence measures with the semantic, pragmatic and discourse structure knowledge enbodied in the MINDS-II system. The newly developed acoustic confidence metrics output independent probabilites that a word is recognized correctly along with a measure of how reliably we can estimate if a word is wrong. The acoustic confidence metrics are derived from normalized acoustic recognition scores. The acoustic scores are normalized by estimates of the denomiator of Bayes equation. To evaluate the utility of using the acoustic techniques together with higher-level constraints, the preliminary system restricted component interaction. Words with normalized acoustic scores that had a 95% or greater probability of being incorrect were flagged prior to being input to the MINDS-II analysis module. For this study, MINDS-II independently used its higher-level knowledge to detect recognition errors that were semantically or contextually inappropriate. Misrecognized word strings were then re-recognized using an RTN-based speech decoder guided by a dynamically derived, highly constrained grammar that restricts the possible words that can be matched, biasing the recognizer against illogical and highly improbable content. Speaker goals and plans, contextual appropriateness, discourse and spontaneous speech structure are all considered in the derivation of grammars. A grammar is dynamically derived for each string of misrecognized words encountered within an utterance and essentially defines a set of semantic content predictions for the word string. Although a rudimentary procedure was employed to estimate the conjoined effect of merging these two knowledge sources, the results indicate that the conjoined usage of normalized acoustic confidence measures of accuracy and the higherlevel, semantic, pragmatic and discourse level constraints embodied in the MINDS-II system enables the larger system to overcomes the weaknesses of each individual technique. The techniques detect complementary phenomena. Significantly more recognition errors and out-of-vocabulary words are detected when both of these techniques are used together than when either is used alone. Alone, the acoustic methods can only detect between 2/3 and 3/4 of the recognition errors. Similarly, the higher-level constraint based methods cannot detect contextually consistent misrecogntions. Together, however, the acoustic methods detect the significant misrecognized content words that are missed by the higher-level, knowledge-based techniques. Further, the knowledge-based techniques detect most of the mid-utterance corrections and interjections as well as many of the confusible and small words that are missed by the acoustic methods. Current work focuses upon development of more sophisticated techniques for conjoining these two knowledge sources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Out-Of-Vocabulary Detection and Confidence Measures for Speech Recognition Using Phone Models

This paper describes a fast and efficient method to detect out-of-vocabulary words and compute confidence measures in a command-based speech recognition system. The method uses a phone-loop model to reject out-of-vocabulary words and a filler model to compute a confidence measure for each accepted word present in the recognizer output. Tests with this method show that it achieves a good trade-o...

متن کامل

Automatic out-of-language detection based on confidence measures derived from LVCSR word and phone lattices

Confidence Measures (CMs) estimated from Large Vocabulary Continuous Speech Recognition (LVCSR) outputs are commonly used metrics to detect incorrectly recognized words. In this paper, we propose to exploit CMs derived from frame-based word and phone posteriors to detect speech segments containing pronunciations from non-target (alien) languages. The LVCSR system used is built for English, whic...

متن کامل

Combined low level and high level features for out-of-vocabulary word detection

This paper addresses the issue of Out-Of-Vocabulary (OOV) words detection in Large Vocabulary Continuous Speech Recognition (LVCRS) systems. We propose a method inspired by confidence measures, that consists in analyzing the recognition system outputs in order to automatically detect errors due to OOV words. This method combines various features based on acoustic, linguistic, decoding graph and...

متن کامل

Phone-duration-based confidence measures for embedded applications

In order to detect misrecognitions that may result from a mismatch between training and testing data, we use a confidence measure (CM) that collects a set of features during recognition and from the N-best list that is output by the recognizer. A neural network (NN) then calculates the probability that the utterance was recognized correctly based on these features. Since for misrecognized utter...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994